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Shanti  S.  Gupta  and  Jianjun  Li 


Abstract:  Empirical  Bayes  inference  problems  involve  the  estimation  of  unknown  functions 
(a  density  and  its  derivative).  It  is  well  known  that  this  can  be  done  through  the  kernel 
method,  i.e.  using  a  fixed  index  kernel  and  varied  window  bandwidth.  In  this  paper,  we 
introduce  the  kernel  sequence  method  which  considers  using  a  sequence  of  kernel  functions 
and  allows  the  kernel  index  and  window  bandwidth  to  vary  simultaneously  in  the  estimates. 
This  method  usually  produces  better  estimates  since  varied  kernels  give  us  more  flexibility 
to  do  so. 

We  apply  the  above  method  to  the  construction  of  the  monotone  empirical  Bayes  test  for 
the  general  continuous  one-parameter  exponential  family.  The  rule  we  construct  is  shown  to 
have  a  rate  of  convergence  of  (In  n)3+£/n  for  any  e  >  0.  This  rate  is  a  substantial  improvement 
over  the  previous  results.  Note  that  this  rate  is  much  closer  to  1  /n,  which  is  proved  here  to 
be  a  lower  bound  for  the  monotone  empirical  Bayes  tests.  So  the  rule  has  good  large  sample 
behavior.  Since  the  rule  is  monotone,  it  also  has  good  performance  for  small  samples. 


1This  research  was  supported  in  part  by  US  Army  Research  Office,  Grant  DAAD 19-00- 1-0502  at  Purdue  University. 
AMS  Classification:  62C12. 

Keywords:  Empirical  Bayes,  regret  Bayes  risk,  optimal  rate  of  convergence,  minimax. 
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1.  Introduction.  Assume  that  A'  is  an  observation  from  the  distribution  with  density 


f(x\9)  =  c(9)exp{9x}h(x),  -oo  <  o  <  x  <  b  <  +oo,  (1.1) 


where  h(x)  is  continuous,  positive  for  x  €  (a,  6),  9  is  the  parameter,  which  is  distributed 
according  to  an  unknown  prior  G  on  the  parameter  space  Q,  a  subset  of  the  natural  parameter 
space  [6  :  c(9 )  >  0}. 

We  consider  the  problem  of  testing  the  hypotheses  H0  :  9  <  90  versus  Hi  :  9  >  90, 
where  9q  is  known.  The  loss  function  is  1(9, 0)  =  max{0  —  $o,  0}  for  accepting  Ho  and 

1 1)  =  max{#o  —  9, 0}  for  accepting  H\.  A  test  8(x)  is  defined  to  be  a  measurable  mapping 
from  (a,  b)  into  [0,  lj  so  that  5(x)  =  P{  accepting  Hi\X  =  x),  i.e.,  S(x)  is  the  probability  ot 
accepting  H\  when  X  =  x  is  observed.  Let  R(G,  5)  denote  the  Bayes  risk  of  a  test  5  when 
G  is  a  prior  distribution.  Let  <?g(x)  —  E[9\X  =  x].  Given  that  i?[|0|]  <  oo,  a  Bayes  test  5q 


is  found  as 


8g(x)  = 


1 


0 


if  <f>G(x)  >  9q, 
if  <Pg(x)  <  9o- 


(1.2) 


Because  4>G(x)  involves  G,  the  above  solution  works  only  if  the  prior  G  is  given.  If  G  is 
unknown,  this  testing  problem  is  formed  as  a  compound  decision  problem  and  the  empiiical 

Bayes  approach  is  used.  Let  Xi,X%,  •  •  •  ,Xn  be  the  observations  from  n  independent  past 


experiences  and  let  X  be  the  present  observation.  Based 


y 


(Y. 


v  ^  v 


an  empirical  Bayes  rule  5n(X,  Xn)  can  be  constructed.  The  performance  of  Sn  is  measured  by 
R(G,Sn)  -  R(G,8g),  where  R(G,8n)  =  E[R(G,8n\Xn)}-  The  quantity  R(G,8n)  -  R(G,8g) 


is  referred  as  the  regret  Bayes  risk  (or  regret)  in  the  literature. 

Denote  aG(x)  =  f  c(6)exp(9x)dG(9),  ^G(x)  =  f  9c(9)  exp(8x)dG(9).  It  is  clear  that 
4>g(x)  -  ^g(x) / 0(g(x)  and  4>g(x)  >  90  w(x)  =  0o«g(^)  —  ^g(x)  ^  0-  S°  the  construc¬ 
tion  of  8n  involves  the  estimation  of  og(^)  and  <Pg(% )•  This  is  usually  done  using  the  kernel 
method.  In  this  paper,  we  introduce  the  kernel  sequence  method  and  apply  it  to  obtain  the 


3 


estimates  of  aG{x)  and  (pG{x).  The  kernel  sequence  method  considers  using  a  sequence  of 
kernel  functions,  and  the  kernel  index  and  window  bandwidth  are  allowed  to  vary  simultane¬ 
ously  in  the  estimate(s).  This  method  usually  produces  better  estimates  since  varied  kernels 
give  us  more  flexibility  to  do  so. 

Based  on  the  estimates  of  ctG{x)  and  4>G(x) ,  we  construct  an  empirical  Bayes  rule  5n  for 
the  testing  problem  mentioned  above.  Then  we  show  that  Sn  has  a  rate  of  convergence  of 
(In  n)3+e /n  for  any  e  >  0  with  the  assumption  E[\6\]  <  oo,  which  is  a  substantial  improvement 
over  the  previous  results.  Note  that  this  rate  is  much  closer  to  1/n,  which  is  proved  here  to 
be  a  lower  bound  for  the  monotone  empirical  Bayes  tests.  So  the  rule  has  good  large  sample 
behaviour.  Since  the  rule  is  monotone,  it  also  has  good  performance  for  small  samples. 

The  readers  interested  in  empirical  Bayes  approach  may  refer  to  two  introductory  papers 
of  Robbins  (1956,  1964).  For  the  above  empirical  Bayes  testing  problem,  Johns  and  Van 
Ryzin  (1972)  made  an  early  contribution.  Van  Houwelingen  (1976)  used  the  monotonicity 
of  the  problem  and  constructed  the  monotone  empirical  Bayes  tests,  which  achieve  the  rate 
of  0(n_2r/(2r+1)(lnr?)2)  if  £[|0|r+1]  <  oo.  Van  Houwelingen  also  showed  that  his  rules  have 
a  good  performance  for  small  samples  since  they  are  monotone.  Karunamuni  and  Yang 
(1995)  studied  monotone  rules  and  their  asymptotic  behavior.  With  one  more  assumption 
cq  G  [— A,  A],  they  obtained  the  rate  of  0(n_2r^2r+1^).  Karunamuni  (1996)  tried  to  find 
the  optimal  rate  of  convergence  of  the  monotone  empirical  Bayes  rule.  But  he  failed;  see 
Liang  (2000a)  and  Liang  (2000b),  Gupta  and  Li(2000).  Another  related  work  is  from  Stijnen 
(1985).  He  studied  the  asymptotic  behaviour  of  both  the  monotone  empirical  Bayes  rules 
and  non-monotone  rules. 

This  paper  is  organized  as  follows:  In  Section  2  we  introduce  a  few  preliminary  results. 
In  Section  3  we  introduce  the  idea  of  kernel  sequence  method.  In  Section  4,  we  construct 
the  monotone  empirical  Bayes  test  Sn  and  obtain  its  rate  of  convergence.  Section  5  gives  a 
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lower  bound  of  monotone  empirical  Bayes  tests,  which  is  n~1.  Section  6  contains  the  proofs 
of  the  main  results  in  Section  4  and  Section  5.  In  the  appendix,  we  provide  the  proofs  of  a 
few  lemmas  used  in  Section  6. 


2.  Preliminary.  We  assume  J\6\dG(6)  <  oo  throughout  this  paper.  Note  that  Ckc(x)  and 
<pG{x)  exist  for  all  x  £  (a,  6)  under  the  assumption  J  \6\dG(8)  <  oo.  Therefore  they  are 
infinitely  differentiable  for  x  £  (a,  6).  Furthermore,  4>'q{x)  >  0  and  (j>G{x)  is  an  increasing 
function.  If  \imxla  (j>G(x)  >  d0,  then  4>a{x)  >  80  and  5a{x)  =  1  for  all  x  £  (a,  6);  If 
limI^0G(a;)  <  do,  then  4>g{x)  <  #o  and  SG(x)  =  0  for  all  x  £  ( a.b ).  In  both  cases,  we 
call  that  5g{x)  is  degenerate.  We  assume  that  SG{x)  is  non-degenerate  in  the  following,  i.e., 
we  assume  that  \imxla  d>G(x)  <  d0  <  limlT6  <&;(*)■  Then  G  is  non-degernate  and  < t>G(x )  >  0. 
Therefore  there  exists  the  unique  point  Cg  £  ( a,  6 )  such  that  cpG(x)  >  6o  for  x  >  cg , 
<f>G(x)  =  8o  for  x  =  cG  and  4>G{x)  <  0O  for  x  <  cq  (see  Van  Houwelingen  (1976)  and  others). 
Note  that  w(x)  =  90aG(x )  -  ipG{x).  Then  cq  is  the  unique  root  of  w{x). 

Based  on  the  previous  discussion,  the  Bayes  rule  stated  in  Section  1  can  be  represented 


as 

{1  if  <t>G(x)  >  6o  w(x)  <  0  x  >cg, 

(2.1) 

0  if  4>g(x)  <  do  «=>  w{x)  >  0  <(=>  x  <  cg- 

Noting  that  the  Bayes  rule  6G  is  characterized  by  a  single  number  cG .  a  monotone  empirical 
Bayes  test  (MEBT)  can  be  constructed  through  estimating  cg  by  cn(X\,  X2,  •  •  • ,  X„),  say, 


and  defining 


Then  the  regret  of  8n  is 


Sn  = 


1  if 
0  if 


x  >  cn, 
X  <  cn. 


R(G,6n)  -  R(G,5g )  =  E  [CG  w{x)h{x)dx. 

JCn 


(2.2) 


(2.3) 


5 


Remark  2.1.  The  assumption  that  6c{x)  is  non-degenerate  is  not  crucial  in  this  empirical 
Bayes  testing  problem.  It  can  be  reduced  for  the  particular  case  of  (1-1);  see  Gupta  and  Li 
(2000). 

3.  Kernel  Sequence  method.  The  kernel  method  has  been  used  by  many  authors  over 
the  years.  Here  we  introduce  the  kernel  sequence  method  which  uses  a  sequence  of  kernel 
functions  instead  of  the  single  one.  As  the  number  of  observations  n  increases,  the  kernel 
function  and  the  kernel  window  bandwidth  are  set  to  vary  simultaneously. 

For  each  i  =  0, 1  and  m  =  1, 2,  •  •  •,  let  Kim(y)  be  a  Borel-measurable  function  such  that 
Kim(y )  vanishes  outside  the  interval  (Aim,  Bim],  and  for  Kom(y) 

=  1  if  j  =  0, 

JyJI<0m(y)dy<  =  0  if  j  =  1, 2,  •  •  • ,  m  -  1,  •  •  • ,  ko,n  -  1,  (3-1) 

7^  0  if  j  —  ho m, 

and  for  K\m{y) 

=  0  if  j  =  0,2,3,  ,m,  •••  ,kim  —  1. 

j  I<lm{y)dy  <  =\  if  j  =  i.  (3.2) 

+  0  if  j  =  klm. 

Let  u  =  un  be  a  sequence  of  positive  numbers  and  v  =  vn  be  a  sequence  of  positive  integer 
numbers.  For  any  x  E  (a,  b),  define 

For  u  and  v  being  properly  chosen,  an(x)  and  ipn(x)  are  the  estimates  of  Qg(x)  and  4>g(x) 
respectively.  In  these  kernel  estimates,  u  is  called  the  kernel  (window)  bandwidth  and  v  is 
called  the  kernel  index. 

Note  that  the  kernel  indices  of  functions  and  K\v  will  change  as  n  increases.  The 
method  here  is  a  little  different  from  the  traditional  fixed  index  kernel  method.  Here  both 
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the  kernel  indices  and  window  bandwidths  vary  in  the  construction. 


4.  MEBT  For  General  Exponential  Family.  We  use  the  idea  of  the  kernel  sequence 
method  to  find  the  estimators  of  aG{x)  and  t'G{x).  Then  we  construct  cn  based  on  these 
estimators. 

We  present  the  two  sequences  of  kernel  functions  used  in  this  paper.  Define  I<0v  as  follows: 
For  odd  v,  K0v{y )  =  /v0(v+i)(y);  for  even  v, 

PvVv  +  Pv-iVv~l  +  •  •  ’  +  Po,  if  -  1  <  y  <  li 


where 


KoM  =  . 

0,  otherwise, 

0,  if  i  is  odd, 

l  lUlse 

Define  Kiv(y)  as  follows:  For  even  v,  I<iv{y)  =  Ki(v+i ){y)\  f°r  odd  v, 


(4.1) 


Pi  =  S 


KxM 


qvyv  +  qv-iy 

0, 


.t— 1 


where 


*  = 


0. 


+  <?0)  if  ~  l  ^  y  ^  l) 

otherwise. 

if  i  is  even. 


(4.2) 


(-QtHWfr+Wv-HVfr -!)(«—.•) ...  if  i  is  odd. 

Then  Kov(y)  defined  by  (4.1)  satisfies  (3.1)  with  Aov  =  —  1,  Bqv  =  1,  kov  =  v  if  v  is  even 
and  kov  =  v  +  1  if  v  is  odd;  Kiv(y)  defined  by  (4.2)  satisfies  (3.2)  with  A\v  =  -1,  Blv  =  1, 
klv  =  v  if  v  is  odd  and  kiv  =  v  +  1  if  v  is  even;  see  Gasser,  Muller  and  Mammitzsch  (1985). 

Let  en  be  a  sequence  of  positive  numbers  with  en  — 1 •  0.  Denote  u  =  un  =  e J/3.  Let  v  =  vn 
be  a  sequence  of  integer  numbers  such  that  uv  ~  n_1.  For  any  x  €  (a,  6),  define 


a 


1  n  y  _  ,r 


mxt),  p,(x)  =—2±  Ku(^-^Vh(X,).  (4.3) 

'  u  nu2  u 

It  is  shown  later  that  an(x )  and  <pn(x)  are  consistent  estimators  of  aG(x)  and  4>G(x)  respec¬ 
tively.  Therefore  Wn{x)  =  90an(x)  -  rpn(x)  is  a  consistent  estimator  of  w(x). 
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Since  Cq  is  the  unique  root  of  w(x),  we  are  going  to  use  \Vn(x)  to  construct  c„.  Before 
doing  this,  let  us  examine  6G.  Note  that  5q  is  a  monotone  rule.  If  x  is  larger  than  cq ,  we 
accept  Hi;  If  x  is  smaller  than  cG,  we  accept  H0.  Since  G  is  unknown,  we  do  not  know  at 
which  point  we  should  accept  Ho  or  reject  it.  But,  one  will  be  more  likely  to  accept  H \  if 
the  present  observation  x  is  quite  large  and  accept  Ho  if  it  is  quite  small.  By  knowing  this, 
we  want  to  find  two  numbers  c\n  and  C2„  such  that  we  accept  Hi  if  we  observe  x  >  c2n  and 
accept  H0  if  we  observe  x  <  ci„.  Here  both  cutoff  points  C\n  and  C2„  depend  on  n.  This  could 
be  understood  as  follows.  As  n  increases,  we  have  more  information  from  the  accumulated 
data,  and  we  should  adapt  new  cin  and  C2n  so  that  our  decision  can  be  made  more  precisely. 
Once  proper  c\n  and  C2n  are  found,  we  can  concentrate  our  effort  on  i  6  [cin)C2n]  in  our 
construction. 

The  idea  of  splitting  (a,  b)  into  (a,Cin),  [ci„,C2n]  and  (c2n,b)  is  called  the  localization 
technique.  To  implement  the  localization  technique,  the  following  lemma  is  necessary. 

Lemma  4.1.  Four  sequences  of  numbers  {an.dn,bn,bn}  can  be  found  such  that  an  I  a, 
bn  |  b,  and  as  n  is  large 

(i)  -[(In Inn)  A  u_1]  <  an  <  bn  <  [(In Inn)  A  u-1]; 

(ii)  minan<I<6n  h(x)  >  u; 

(hi)  £  h(t)dt  >  2 u,  ft  h(t)dt  >  2 u. 

Let  ci„  =  an  +  u  +  u1/3  and  c2n  =  bn  -  u  -  u1/3.  From  Lemma  4.1,  we  know  that  Ci„  f  a 
and  c2n  T  b.  So  cG  will  fall  in  [ci„,c2n]  for  large  values  of  n.  Then  we  define  Cn  as  in  the 
following: 

[C2n 

Cn  —  I  I[Wn(x)>  Q\dx  +  Cin. 


(4.4) 
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A  monotone  empirical  Bayes  test  5n(x)  is  now  proposed  as  follows: 


<5„  = 


1  if  x  >  cn, 
0  if  X  <  Cn- 


)  ->) 


It  is  obvious  that  Cn  G  [ci„,c2n].  So  if  x  >  c2n ,  we  will  accept  Hi,  and  if  x  <  ci„,  ’  ill 
accept  H0.  If  x  G  [cln.  c2n],  we  will  calculate  Cn  and  compare  x  with  cn  to  make  the  decision. 

The  use  of  the  localization  technique  helps  us  avoid  the  boundary  effect  of  kernel  estimates. 
It  gives  us  nice  bounds  on  the  moments  of  Wn(x)  for  x  G  [ci„, c2n](see  Lemma  6.3  below). 
Also  it  results  in  a  nice  lower  bound  of  |u;(x)|  for  x  G  [ci„,  cg  -  ec]  U  [cg  +  cg,  c2„]  and  cg  >  0 
(see  Lemma  6.2  below),  which  is  crucial  to  get  the  desired  rate  of  convergence  in  Section  6. 
For  more  uses  of  this  technique,  please  see  Gupta  and  Li  (1999a),  Gupta  and  Li  (1999b), 
Gupta  and  Li  (2000)  and  Li  and  Gupta  (2000). 

Note  that  since  Wn(x)  is  an  estimate  of  w{x),  a  natural  construction  of  the  empirical 
Bayes  rule  should  be  6n  =  1  if  Wn{x)  <  0  and  6n  =  0  if  W„(z)  >  0.  Unfortunately  this 
construction  will  lead  to  a  non-monotone  rule.  So  we  use  the  integration  of  /[wn(x)>o|  hi 
(4.4)  instead.  This  technique  is  borrowed  from  Brown,  Cohen,  and  Strawderman  (1976), 
Van  Houwelingen  (1976)  and  Stijnen  (1985). 

Now  we  study  the  large  sample  behaviour  of  Sn.  The  next  two  lemmas  enable  us  to 
express  the  regret  of  <5n  through  Cn  —  cg- 


Lemma  4.2.  w'(cg)  <  0. 

Since  w'(x )  is  continuous  in  (a,  b),  we  can  find  V£g(cg),  a  neighborhood  of  Cg,  such  that 
N£g(cg)  C  (cln,c2n)  C  (a,b)  (  as  n  is  large),  and  Ae  =  minl6W<c(CG)[-w'(a:)]  >  0.  Denote 
Tji  —  Cg  —  (-G  and  7j2  =  cg  +  cg  in  the  following. 
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Lemma  4.3.  Let  h  =  sup{/i(x)  :  x  G  [771,772]}  and  w  ~  sup{-u/(x)  :  x  G  [771,772]}.  Then 
R(G,6n)  —  R{G,5g)  <  l/2huiE(cn  -  cg)2  +  (do  +  E[\d\})eG4  E(cn  -  cg)4- 


Following  (4.4)  and  cq  G  [cln,c2n],  we  have  cn-cG  =  -  fcTn  I[wn(x)<o]dx  +  f£n  I\wn(x)>o]dx. 
So  a  upper  bound  of  cn  -  cq  is  easy  to  obtain  through  the  properties  of  Wn(x)  and  w(x). 
Note  that  Wn(x)  can  be  written  as 


1  ^ 


«o  1  KiJAAl 


W,(x)  =  -  E  V„(Xhx),  where  Hft,.)  =  -  •  - ^ 

6  3= 1 

For  fixed  n  and  x,  Vn(Xjtx)  are  i.i.d.  random  variables.  So  Wn(x)  is  the  sum  of  the  i.i.d. 
random  variables.  After  applying  the  results  in  Petrov  (1995),  we  have  the  following  result. 


Lemma  4.4.  limn_oo[77en(ln 77)3jF(c„  —  cg)2]  —  0,  limn— oo[77Cn(lnn)  E(cn  cg)  ]  0. 

The  proofs  of  Lemma  4. 1-4.4  are  given  in  Section  6.  Lemma  4.3  and  Lemma  4.4  lead  us 
to  the  following  theorem. 


Theorem  4.1.  Assume  that  / 10| dG(B)  <  00  and  the  Bayes  rule  SG  is  nondegenerate. 
Then  for  any  e  >  0,  R(G,5n )  —  R(G,5q )  =  o((lnn)3+£/n). 

Remark  4.1.  In  this  paper,  we  get  a  faster  rate  of  convergence  for  the  general  exponential 
family.  This  is  mainly  due  to  the  use  of  the  kernel  sequence  in  the  construction  of  estimate 
of  w(x).  The  previous  papers  in  the  literature  constructed  the  empirical  Bayes  rules  based 
on  the  kernel  estimation  with  fixed  kernel  functions  and  varied  window  bandwidths.  So 
the  resulting  rates  are  not  fast.  Now  we  let  kernel  functions  and  window  bandwidths  vary 
simultaneously.  Then  a  better  rate  of  convergence  is  obtained. 


Remark  4.2.  To  apply  the  kernel  sequence  method,  a  key  question  is  how  to  construct 
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this  sequence  of  kernel  functions.  In  this  paper  we  use  the  result  obtained  by  Gasser,  Muller 
and  Mammitzsch  (1985).  We  expect  that  the  rate  here  will  be  improved  if  a  “better”  kernel 
sequence  is  found. 

Remark  4.3.  Note  that  the  rule  6n  is  monotone.  It  has  the  weak  admissibility  (  see  Van 
Houwelingen  (1976)).  So  it  also  has  good  performance  for  small  samples. 

Remark  4.4.  The  result  (4.6)  is  a  rate  of  convergence  for  the  general  distribution  (1.1). 
For  some  special  member  of  the  exponential  family,  the  special  property  of  that  family 
member  may  be  incorported  in  the  construction.  Therefore,  a  better  rate  can  possibly  be 
obtained.  See  Liang  (2000a)  and  Liang  (2000b),  Gupta  and  Li  (2000). 

5.  Lower  bound.  We  shall  prove  that  1/n  is  a  lower  bound  for  any  MEBT  even  if  9  is 
bounded. 

As  presented  in  Section  2,  the  problem  of  constructing  a  monotone  empirical  Bayes  rule 
is  essentially  equivalent  to  finding  an  estimator  c*  of  cg ,  a  functional  of  the  marginal  dis¬ 
tribution  /c(z)  of  X,  based  on  the  i.i.d.  sample  Xi,-  ■  ■  ,X„.  So  a  lower  bound  of  MEBT’s 
can  be  found  through  obtaining  a  lower  bound  of  c*  going  to  cg-  This  will  be  done  using 
the  ideas  from  Donoho  and  Liu  (1991)  or  Fan  (1991)  and  then  constructing  carefully  the 
hardest  two-point  subproblem.  In  the  following,  h,  h ,  •  •  ■  stand  for  the  positive  constants, 
which  may  have  different  values  on  different  occasions. 

Let  Q  be  the  set  of  prior  distributions  with  bounded  supports  inside  [0o  —  9d,9o  +  0d\  C 
for  some  6d  >  0.  Let  C  be  the  set  of  estimators  c*  of  cg  (  a  <  c*  <  6  )  and  V  be  the  set 
of  empirical  Bayes  rules  of  type  (2.2)  with  Cn  =  c*  £  C.  In  order  to  find  a  minimax  lower 
bound  of  MEBT’s  over  Q,  we  first  define  Q0,  a  subset  of  Q. 
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Denote  60 1  =  0o  -  9d/ 2  and  0O2  =  #o  +  6d/2.  Choose  any  Co  G  (a,  b).  Let 

g0(6)  =  m0  exp  (~9co)/c{6)I[eoi<g<e02),  gi(9)  =  mx  exp(-0xdfyo(0), 

where  (i)  mi  is  normalizing  constant  satisfying  f  gi{9)dd  =  1  for  i  =  1,  2,  (ii)  xd  satisfies  that 
w'0(x)  <  1/2w'q(cq)  <  0  for  all  x  €  [c&  -  xd,  Co  +  xd]  C  (a,b),  w0(x)  =  w(x)  associated  with 
G  ~  go  ( dG(6 )  =  go(0)dO).  Let  T  =  {fG(x)  =  f  f(x\6)dG{6) :  G  €  £0},  where 

£0  =  {<?  :  G  ~  5  =  (1  +  +  9o(0)l™  =  0,1,*-*, 00}. 

The  next  lemma  tells  us  that  finding  a  lower  bound  of  MEBT’s  is  equivalent  to  finding  a 
lower  bound  of  the  hardest  two-point  subproblem. 

Lemma  5.1.  Let  Ci  be  the  critical  point  corresponding  to  fi,  i  =  1.  2.  Then 

inf  sup [R{G, 5*)  —  R{G,  <5c)] 

Ksv  G€Q 

>  inf  sup  [R(G,5^)  —  R(G,  ^g)] 

5*e®  Geg0 

>  iisup{(ci  -  c2)2  :  J [yfh(x)  -  \fh(x)}2dx  <  h/n,  /1,  fy  €  F). 

The  lemma  5.1  is  proved  based  on  a  result  of  Donoho  and  Liu  (1991).  From  this  lemma, 
we  need  to  identify  f\  and  in  T  to  find  the  minimax  lower  bound. 

Lemma  5.2.  Let  (fefy)  =  (1  +  '/n)~1[y/ngi(d)  +  gt>(0)].  Let  fi(x)  =  J  f(x\0)gi(6)d6  for 
*  =  1,2.  Then  /jgf  and 

j[\fhT)  -  {uTtfdx  <  (c2  -c,)2  >  h. 

As  a  natural  conclusion  of  Lemma  5.1  and  Lemma  5.2,  we  have  the  following  theorem. 
Theorem  5.1.  For  some  l  >  0,  infy.g-D  supCe5[fl(G,  <5*)  —  R(G,8g)]  >  l/n. 
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Remark  5.1.  A  natural  question  for  empirical  Bayes  inference  problems  is:  what  is  a 
lower  (or  the  best  lower)  bound  of  monotone  empirical  Bayes  rules  for  general  exponential 
family.  For  empirical  estimation  problem.  Singh  (1979)  conjectured  that  n  1  is  a  lower  bound 
and  also  it  is  not  obtainable  even  if  6  is  bounded.  For  the  testing  problem,  we  know  now 
that  n-1  is  a  lower  bound  for  the  monotone  empirical  Bayes  rules. 

Remark  5.2.  Since  the  optimal  rate  of  monotone  rules  for  N(6, 1)  is  (In  n)L5/n  (  see 
Gupta  and  Li  (2000)),  n"1  may  not  be  the  best  lower  bound  or  obtainable  lower  bound  for 
general  exponential  family  (1.1).  Also  we  believe  that  it  is  not  possible  to  find  the  obtainable 
lower  bound  for  family  (1.1)  once.  It  must  be  done  for  each  distribution  individually  and 
the  information  stored  in  that  particular  distribution  must  be  incoporated. 

6.  Proofs.  We  shall  prove  the  results  in  the  previous  sections.  First  we  state  some 
lemmas  which  will  be  used  in  this  section.  Their  proofs  are  provided  in  the  appendix. 

6.1.  Some  Lemmas.  As  n  is  large,  we  have  the  following  lemmas. 

Lemma  6.1.  Let  an  =  max{ac(x)  '•  x  G  [an,6„]}.  Then  an  <  (2 u)  1 . 

Lemma  6.2.  For  x  G  [cin,C2„],  |io(a:)|  <  2/ti2; 

For  x  G  [cj„,r/i]  U  [r/2,c2„],  |u/(s)|  >  M  ■  u{\nn)~B ,  where  M  >  0,  B  >  0. 

Lemma  6.3.  Let  wn(x)  =  £’[V/n(Xj,r)],  Zjn  =  Vn(Xj,x)  —  wn(x),  crl(x)  =  .EjlZj,,!-]  and 
Jn(x)  =  E[\Zjn\3}.  Then 

(i)  For  x  €  [cin,c2n],  |u»n(x)  -  w(x)|  <  1/ \Jn. 

(ii)  For  x  G  [cln, c2n],  an(x)  <  Lv^u^2-  for  x  G  [r?1,772],  Z2  <  an(x)  <  l3(v/u )3'2. 

(iii)  Forx  G  [cin,c2„],  j „(x)  <  l4v1336vu~6. 
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Lemma  6.4.  Let  dn  =  Jv^/nu3.  For  x  €  [ Ci„,C2n ]> 


w(x)  >dn^  wn{x )  >  w{x)/ 2,  u;(x)  <  -dn  =>  w„(z)  <  w{x)/2. 


6.2.  Proof  of  Lemma  4.1.  Lemma  4.1  is  obvious  intuitively.  We  also  give  a  rigorous 
proof  here.  Let  h{a+ )  =  lim xlah{x)  and  h(b-)  =  \imx]bh{x).  Choose  any  £  e  (a,  6).  Let 


max{a  <  x  <  £  :  h(x)  <  u}  if  h(a+ )  =  0, 
a  if  0  <  /i(a+)  <  oo, 

min{£  <  x  <  b  :  h(x)  <  u}  if  h(b—)  =  0, 
b  if  0  <  h(b~)  <  oo, 


max{a  <  x  <  £  :  /i(t)df  <  2u}  if  fi(t)df  <  oo, 
a  if  Ja  h(t)dt  =  oo, 


min{£  <  x  <  b  :  ff?  h(t)dt  <  2u}  if  h{t)dt  <  oo, 


if  fj?  h(t)dt  =  oo. 


Then  we  define  a„  and  bn  as  follows: 


And  let 


an  =  haV  Sa\/  (a  +  1/n)  V  (-In  Inn)  V  (-1/n), 
bn  =  hb  A  Sb  A  (6  -  1/n)  A  (In  In  n)  A  (1/n). 


a  if  /j  h(t)dt  <  oo, 

xa  6  {a  <  x  <  £  :  /“"  /i(f)dt  >  2n}  if  J*  fi(t)dt  =  oo, 


b  if  J)?  h(t)dt  <  oo, 

=  * 

x6  e  {£  <  x  <  6  :  J£  fc(t)dt  >  2n}  if  £  h(t)dt  =  oo. 
Then  it  is  easy  to  see  that  an  1  a,  bn  |  6,  (i),  (ii)  and  (iii)  in  Lemma  4.1  hold. 
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6.3.  Proof  of  Lemma  4.2.  Note  that  atG(x)  is  infinitely  differentiable,  a'G{x)  =  ipG(x) 

and  w'(x )  —  9oiPg{x)  —  ^g{x)-  If  V'g(cg)  =  0-  then  w'(cg)  —  —  J  02c(d)e9cGdG(9)  <  0.  If 
iPg{cg)  >  0,  by  Jesen  Inequality  >  ^g(cg)/o;g(cg)  =  #o-  Thus  w'(ca)  <  0. 

Similarly,  if  iPg(cg)  <  0,  w\cg)  <  0.  The  proof  of  Lemma  4.2  is  complete. 

6.4.  Proof  of  Lemma  4.3.  From  (2.3), 

R(G,5n)  -  R(G,8g)  <  E[I{ ic-coixd  r  w(x)h(x)dx}  +  hE[I[lCn.ccl<(c]  [  w(x)dx] 

Jcn  Jcn 

^  (do  "  Q?)  1/2 hwE(cn  —  cq)  , 

where  iy(x)/i(x)da;  <  (&o  +  L1 G )  and  by  Taylor  expansion 

I[\ c,-cg\<(g\  [  w(x)dx  =  -1/2  X  u/(cn)(Cn  -  CG)2-f(|C„-cG|<£cl  ^  1  /2w{Cn  ~  Ccf  ■ 


6.5.  Proof  of  Lemma  4.4.  From  (4.4), 

E(cn  -  cg )2  <  E[  [  ^[vv„(i)<o \dx}2  +  E{  f  I[\ v„ (x)>o] <ia;]2  =  r\n  +  ron-  (6T) 

Jcin  JCG 

It  turns  out  by  Holder  inequality  and  a  little  algebra  that 


T in  <  2(c2n  —  Cl„)/l  +  2/2  +  2/3, 


(6.2) 


where  I\  =  J™n  P(Wn(x)  <  0 )dx,  /2  =  I[w(x)<dn]dx)2 ,  h  —  E[f°f  I\wn(x)<a,w(x)>dn\dx^ . 

For  w(x)  >  dn ,  wn(x)  >  l/2w(x)  from  Lemma  6.4.  Then  we  have 


pm*) < o) = f (-7=  ± zJn <  v/7nW) <p(-diz,«< 

\Jn(Tn  J=1  "  \Jn(Jni= 1 


2crn 


Applying  Theorem  5.16  on  page  168  in  Petrov  (1995)  to  the  LHS  of  the  above  inequality, 

pm*)  <  o)  <  *(-^)  +  -  «-(x)  +  r.(x).  (0.3) 

where  A  is  a  constant  and  $(•)  is  the  cdf  of  Ar(0, 1).  For  x  €  [ci„,  771],  w(x)  >  Mu(\nn)  B  and 
certainly  w(x)  >  dn  as  n  is  large.  Also  note  that  on  <  liu~5/2v3/2  and  7n(x)  —  hvl336 vu  6. 
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It  follows  that  Sn{x)  <  $(-n1/4)  and  Tn(x)  <  n“3/2  for  large  n.  Thus 

(<*,  ~  d„)/l  =  (C2n  -  C,„)  /  P(W„W  <  0)dl  =  (6-4) 

For  a;  €  [vi,cg],  K(x)|  >  At.  Thus  I2  <  Letting  y  =  w(x)/dn, 

h  <  A:2cPn  /o°°  I[y<i)dy  =  A;2cPn.  Therefore 

I2  =  0{dl)  =  o((lnn)3/(ne„)),  (6.5) 


By  Holder  inequality  again, 

h  <  P(Wn(x)  <  0)M x)f2I[w{x)>dn]dx  x  r\w{x)}-z/2I[w{x)>dn]dx. 

J*ll 

Letting  y  =  w(x)/dn ,  f^[w(x)]-^2I[w{x)>dn}dx  <  2 /[Aey/(Q.  Using  the  previous  two  in¬ 
equalities  and  (6.3),  we  have 


<  2/{Atdnl/2){  r  Sn(x)[w(x)f2dx  +  r  Tn(x)[w(x)f2dx}.  (6.6) 

Jr)  i  Jrji 


For  x  €  [t?i,cg],  h  <  &n  <  h \Jvz/uz  and  7n(x)  <  Z4v1336 6.  Therefore 

£■  S„(x)wi(x)dx  <  i-  jT  «(-^|^)i»(x)]l^(x)  <  2^  jf  *<-»)»»*, 

(6.7) 


and 


f^rrw  \r  /  m3/2j  ^  8x4/4V1336v  r00  y3/2  ,  /,.  0>. 

L  T”WMx)]  dx  <  ^  l  M) 

Combining  (6.6)-(6.8),  we  have  I3  =  o((lnn)3/(nen)).  This  together  with  (6.4)  and  (6.5) 
yields  ri„  =  o((lnn)3/(nen)).  Similarly  r2n  —  o((lnn)3/(ne„)).  Then  E(cn  —  cg)2  = 
o((lnn)3/(nen)).  Similarly,  E(cn  —  cg)4  =  o((lnn)3/(ne„)).  This  completes  the  proof  of 
Lemma  4.4. 


6.6.  Proof  of  Lemma  5.1.  Let  wi(x)  =  w(x)  with  G  ~  g\.  Then  wi(x )  =  miWoix—Xd) 
and  Ci  =  cq  +  xj.  Since  wi(co)  >  0  and  wq(ci)  <  0,  cq  €  [co,ci]  for  G  G  f?o-  Since 
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Wq(x)  <  1/2wq(co )  for  x  G  [co  -  xd,Co  +  xd).  -w'{ J )  >  _(mi  A  l)u>o(co)/2  —  w  >  0  for 
x  G  [co,Ci]  and  G  G  £o- 

Let  C  =  {c*  V  Co  A  ci  :  c*  G  C}.  For  c*  G  C,  denote  Cn  =  c*  V  Co  A  ci.  Note  that  /i(x)  is 
bounded  on  [co.ci].  Then  for  any  G  G  £7oi  /cC;C  w(x)h(x)dx  >  /i  w(x)dx.  From  (2.3) 

inf  sup  [/?((?,  <5*)  —  #(G,  £g)]  >  /i  inf  SUP  £[  /  w(x)dx]. 

«5A€PGee0  v  c;eCGe5o  Jcn 

By  Taylor  expansion,  Jgf  w(x)dx  =  -1/2  x  u/(c*)(cn  —  cG)‘  >  l/2w(e„  —  cg)2-  Therefore 
inf  sup  E[  f  w(x)dx)  >  U  inf  sup  E[(cn  —  cg)2]- 

cn£C  GGGo  JCn  Cn-^G€$0 

Since  C  cC, 

inf  sup  E[(cn  -  cg)2]  =  inf  sup  E[(c n  -  cg)2]  >  inf  sup  E[(c*n  -  cg)2]- 
c-nec G€e0  c„eCG€go  c-cCGe«o 

From  the  results  in  Donoho  and  Liu  (1991)  (Theorem  3.1  and  the  remark  after  Lemma  3.3), 

inf  sup  E\(cn  —  cg)2]  >  h  sup{(ci  —  c2)2  :  / [\] f\ix)  —  \/ /2(-T)]  dx  <  k/n,  fu  h  ^  J~}- 
c’n£cGeGo  J 

Then  Lemma  5.1  follows. 

6.7.  Proof  of  Lemma  5.2.  Note  that  /2(x)  —  fi{x)  =  (1  +  \/n)  1[— fi{%)  +  ^ 0 (^)] > 
where  fo(x)  —  f  c(6)  exp(Ox)h(x)go(O)d0.  For  all  x  G  (a,  b) 

/o(x)[/i(x)]-1  =  [  r  exP(#(x  -  Co ))d0]  ■  [m !  f  exp (6(x  -  xd  -  co))dd]~l  <  h. 

JB  01 

Then  f[y/fi{x)  -  Jf2(x)]2dx  <  /  [/i(x)  -  /2(^)]2//i(^)^  <  (1  +  h)/n. 

Denote  w2{x)  =  w(x)  with  G  ~  g2-  Then  w2(x)  =  (1  +  y/n)~l[\/nmiWo(x  —  xd)  +  Wo(x)]. 
Note  that  (iG^rc) ]  <  h  for  x  G  (co,  Ci)  and  | W2 ( c  1 ) | 2  =  [^2^2)  —  ^(ci)]2  <  l\{c 2  —  ci)  ,  Then 
(c2  -  Cl)2  >  Z4|xo2(c1)l2  =  /4(1  +  \/n)"2K(ci)]2.  The  proof  of  Lemma  5.2  is  complete  now. 


Appendix. 
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Lemma  A.l.  The  following  statements  hold. 

(i)  \Kiv(y)\  <  kvl03Qv,  i  =  0,1,  k  is  some  constant. 

(ii)  v_1  f  \K0v(y)\2dy  -4  tt-1. 

(iii)  v~3  J  \Klv(y)\2dy  -4  (Stt)'1. 

Proof,  (i)  is  obtained  by  simple  calculations.  It  is  omitted  here.  From  our  definition  of 
K0v  and  Klv,  and  Theorem  1  of  Gasser,  Muller  and  Mammitzsch  (1985),  for  an  even  v 


f\  Kl(,v)dy  = 

Since  s[(2s  —  l)!!]2/[(2s)!!]2  - 
can  be  proved  similarly. 


v2[(v  -  l)!!]2 

2[i/!!]2  ’ 

-4  7t— 1  as  s  -4  00, 


L 


1  KiMdu-?lt±lS. 
Li-  toWW-  6[u!!]2  ‘ 


(ii)  and  (iii)  are  obvious.  The  case  of  odd  v 


Proof  of  Lemma  6.1.  Note  that  ol'^x)  =  f  02c(0)e8xdG(0 )  >  0  for  x  €  (a,  6).  Then 
cyg{x)  is  a  convex  function  and  an  =  ocG^n )  V  aG{bn )•  We  prove  ac(fln)  <  (2«)_1  in  the 
following.  The  proof  of  aG(bn )  <  (2u)_1  is  similar.  Since  c(6)  =  l/{/Q6  h(x)e6xdx}  and 
OfG(o.n)  =  f  c{0)e6andG{8),  it  follows 

«c(a„)  <  ^(xjexp^Or  _  ~^f)dxdG^  +  J[e< 0]  h{x)exp(0(x  -  an))dxdG 

Note  that  exp(0(x  -  an))h(x)dx  >  2u  as  6  >  0  and  exp(8(x  -  an))h(x)dx  >  2u  as 
0  <  0  from  Lemma  4.1.  Then  Lemma  6.1  holds. 


Proof  of  Lemma  6.2.  Since  iPg(x)  =  f  0c(0)  exp(0x)dG(0)  and  u|0|  <  exp(u|0|), 
\^g{x)\  <  U~l\[  c(0)exp(0(x  +  u))dG{8)  +  [  c(0)  exp(8(x  —  u))dG(0)]. 

./[0>O]  -/[0<O] 

From  Lemma  6.1,  for  x  E  [ci„,C2n],  oig(x)  <  l/(2u).  Then  |tte(x)|  <  1/n2  and  |m(x)|  <  2/u2 
as  n  is  large.  Assume  that  B  >  0  such  that  Jj|0|<B]  dG(0)  >  0.  Denote  Q,b  —  <  &]■ 

Since  1  /c(0)  is  a  convex  function  of  8  on  Cl  and  therefore  c{8)  is  bounded  on  CIb-  Thus 
fnB  c(6)dG(0)  is  finite. 
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Recall  that  w(x)  =  aG{x)[d0-  4>g(x)}.  Since  <j)G{x )  is  increasing  and  <fc(cc)  =  0,  then  for 

x  £  [cin,i|i],  e0  -  <t>c(x)  >0o-  Oq{j)\ )  >  0;  for  x  €  [m ,c2n],  &?(*)  -  >  <?cM  -  >  0. 

For  x  £  [cin,c2„],  |x|  <  In  Inn  and 

aG(x)  >  [  c{0)exp(-6\\n\nn\)dG(6)  >  {\nn)~B  f  c{0)dG{6). 

JnB  J^B 

Let  M  =  {[0O  -  Mvi)}  A  [Mm)  -  0o]}  •  SaB  c{6)dG(6).  Then  Lemma  6.2  is  proved. 


Proof  of  Lemma  6.3.  We  prove  (i)  for  even  v  only.  It  is  similar  for  odd  v.  Using  Taylor 


expansion  of  e6ux,  simple  calculations  show  that 

r  Xi—x  > 


E[I<uh{^Y]  =  J  C^e<>IdGW  +  uV  I  Xi  KOu{t)v\  ~  -dt^dG^ 


and 


=  / ec(e)e”i G(«)  +«"/«”+lc(0)e'"[/_1i  ^A-Ll - dt\iG(0), 

where  \f\,  |i**|  <  1.  Then  E[Vn(Xj:x)}  =  w(x)  +  uv'2dn{x)  and 

djx)  =  d„u  n  I  K0„(t)l’e“‘r<ll\dGW) 


-u*r-J 


QV4-1 


(n+  i) 


-c{e)e6l{ f  Klv(t)tv+1eeur'dt]dG(6). 


Since  (i u1/30)v/vl  <  exp(|%1/3)  and  (ul/30)v+l /{v  +  1)!  <  exp(|0|u1/3),  for  x  £  [ci„,c2„] 

|d„(x)|  <  uv/6_1  J  c(0)e6x+Wu+WuU3dG{e )  •  [|0O|  £  \K0v(t)\dt  +  £  \Klv(t)\dt). 

<  ^6-1an{|0o|[2  £  \K0v(y)\2dy}1'2  +  [2 £  |^(y)|2dy]1/2}- 

From  Lemma  A.l  and  Lemma  6.1,  |d„(x)|  — +  0  uniformly  for  x  £  [ci„,c2„].  Then  (i)  is 
proved.  Next  we  prove  (ii).  For  x  £  [ci„,c2n],  h(x  +  u)  >  u  from  Lemma  4.1  and 

crl{x)  <  E{60: 


_2 / Kov{^~)  2 


uh(X3)  u^h{Xj)  J 

=  u-3  J  |j0onAou(t)-^lu(t)]2c(0)eeVu‘[M^  +  ^)]_l^ (*) 

<  l\u~Av3  j  c{6)eexeWudG(6) 
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Especially,  for  x  €  [771,772],  letting  h  =  min {h(x  +  ut)  :  x  G  [771,772],  \t\  <  1}, 

o*(x)  <  hu^h-'v3 1  c{d)eexemudG(d )  <  llu~3v3. 

It  is  easy  to  see  that  o2n(x)  >  l\.  We  prove  (iii)  next.  From  Lemma  A.l,  for  i  =  0  or  1, 
\Kiv(t)\  <  kv10Z6v.  Also  note  that  \Kiv{t)\  =  0  if  \t\  >  1.  Then 

\K„{(y  -  x)/u)/h(y)\Ihn<,<c„l  <  fcu1036u//i(y)/[cln<tf<c,B+ui  <  kvnX’irl. 

For  x  £  [cin,  C2n]i  E[\Zjn(x)f\ <  2kvl036vu~l  E[Zj„(x)]  <  l,vn  36”n-6.  The  proof  of  Lemma 
6.3  is  completed. 

Proof  of  Lemma  6.4.  From  lemma  6.3,  we  have  that  |iun(aO  —  u>(x)|  <  1/y/n  for  all 

X  e  [cin.c2„].  If  w(x)  >  dn  and  n  is  large, 

Wnjx)  W{x)  ~  dn  +  <L  ~  \Wn{x)  ~  w(x)|  >  ~  K(x)  -  w(x)[  >  1 
w(x)  ~  w(x)  —  dn  +  dn  ~  dn  ~  2 

Similarly,  we  can  prove  that  w(x)  <  —dn  ==>  wn(x)  <  w(x)/2. 
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